102 research outputs found
Unveiling the Power of Deep Tracking
In the field of generic object tracking numerous attempts have been made to
exploit deep features. Despite all expectations, deep trackers are yet to reach
an outstanding level of performance compared to methods solely based on
handcrafted features. In this paper, we investigate this key issue and propose
an approach to unlock the true potential of deep features for tracking. We
systematically study the characteristics of both deep and shallow features, and
their relation to tracking accuracy and robustness. We identify the limited
data and low spatial resolution as the main challenges, and propose strategies
to counter these issues when integrating deep features for tracking.
Furthermore, we propose a novel adaptive fusion approach that leverages the
complementary properties of deep and shallow features to improve both
robustness and accuracy. Extensive experiments are performed on four
challenging datasets. On VOT2017, our approach significantly outperforms the
top performing tracker from the challenge with a relative gain of 17% in EAO
Adiabatic Quantum Computing for Multi Object Tracking
Multi-Object Tracking (MOT) is most often approached in the tracking-by-detection paradigm, where object detections are associated through time. The association step naturally leads to discrete optimization problems. As these optimization problems are often NP-hard, they can only be solved exactly for small instances on current hardware. Adiabatic quantum computing (AQC) offers a solution for this, as it has the potential to provide a considerable speedup on a range of NP-hard optimization problems in the near future. However, current MOT formulations are unsuitable for quantum computing due to their scaling properties. In this work, we therefore propose the first MOT formulation designed to be solved with AQC. We employ an Ising model that represents the quantum mechanical system implemented on the AQC. We show that our approach is competitive compared with state-of-the-art optimization-based approaches, even when using of-the-shelf integer programming solvers. Finally, we demonstrate that our MOT problem is already solvable on the current generation of real quantum computers for small examples, and analyze the properties of the measured solutions
Measuring the Accuracy of Object Detectors and Trackers
The accuracy of object detectors and trackers is most commonly evaluated by
the Intersection over Union (IoU) criterion. To date, most approaches are
restricted to axis-aligned or oriented boxes and, as a consequence, many
datasets are only labeled with boxes. Nevertheless, axis-aligned or oriented
boxes cannot accurately capture an object's shape. To address this, a number of
densely segmented datasets has started to emerge in both the object detection
and the object tracking communities. However, evaluating the accuracy of object
detectors and trackers that are restricted to boxes on densely segmented data
is not straightforward. To close this gap, we introduce the relative
Intersection over Union (rIoU) accuracy measure. The measure normalizes the IoU
with the optimal box for the segmentation to generate an accuracy measure that
ranges between 0 and 1 and allows a more precise measurement of accuracies.
Furthermore, it enables an efficient and easy way to understand scenes and the
strengths and weaknesses of an object detection or tracking approach. We
display how the new measure can be efficiently calculated and present an
easy-to-use evaluation framework. The framework is tested on the DAVIS and the
VOT2016 segmentations and has been made available to the community.Comment: 10 pages, 7 Figure
Hypernetwork functional image representation
Motivated by the human way of memorizing images we introduce their functional
representation, where an image is represented by a neural network. For this
purpose, we construct a hypernetwork which takes an image and returns weights
to the target network, which maps point from the plane (representing positions
of the pixel) into its corresponding color in the image. Since the obtained
representation is continuous, one can easily inspect the image at various
resolutions and perform on it arbitrary continuous operations. Moreover, by
inspecting interpolations we show that such representation has some properties
characteristic to generative models. To evaluate the proposed mechanism
experimentally, we apply it to image super-resolution problem. Despite using a
single model for various scaling factors, we obtained results comparable to
existing super-resolution methods
Siamese network based features fusion for adaptive visual tracking
© Springer Nature Switzerland AG 2018. Visual object tracking is a popular but challenging problem in computer vision. The main challenge is the lack of priori knowledge of the tracking target, which may be only supervised of a bounding box given in the first frame. Besides, the tracking suffers from many influences as scale variations, deformations, partial occlusions and motion blur, etc. To solve such a challenging problem, a suitable tracking framework is demanded to adopt different tracking scenes. This paper presents a novel approach for robust visual object tracking by multiple features fusion in the Siamese Network. Hand-crafted appearance features and CNN features are combined to mutually compensate for their shortages and enhance the advantages. The proposed network is processed as follows. Firstly, different features are extracted from the tracking frames. Secondly, the extracted features are employed via Correlation Filter respectively to learn corresponding templates, which are used to generate response maps respectively. And finally, the multiple response maps are fused to get a better response map, which can help to locate the target location more accurately. Comprehensive experiments are conducted on three benchmarks: Temple-Color, OTB50 and UAV123. Experimental results demonstrate that the proposed approach achieves state-of-the-art performance on these benchmarks
Long-Term Visual Object Tracking Benchmark
We propose a new long video dataset (called Track Long and Prosper - TLP) and
benchmark for single object tracking. The dataset consists of 50 HD videos from
real world scenarios, encompassing a duration of over 400 minutes (676K
frames), making it more than 20 folds larger in average duration per sequence
and more than 8 folds larger in terms of total covered duration, as compared to
existing generic datasets for visual tracking. The proposed dataset paves a way
to suitably assess long term tracking performance and train better deep
learning architectures (avoiding/reducing augmentation, which may not reflect
real world behaviour). We benchmark the dataset on 17 state of the art trackers
and rank them according to tracking accuracy and run time speeds. We further
present thorough qualitative and quantitative evaluation highlighting the
importance of long term aspect of tracking. Our most interesting observations
are (a) existing short sequence benchmarks fail to bring out the inherent
differences in tracking algorithms which widen up while tracking on long
sequences and (b) the accuracy of trackers abruptly drops on challenging long
sequences, suggesting the potential need of research efforts in the direction
of long-term tracking.Comment: ACCV 2018 (Oral
Learning Rotation Adaptive Correlation Filters in Robust Visual Object Tracking
Visual object tracking is one of the major challenges in the field of
computer vision. Correlation Filter (CF) trackers are one of the most widely
used categories in tracking. Though numerous tracking algorithms based on CFs
are available today, most of them fail to efficiently detect the object in an
unconstrained environment with dynamically changing object appearance. In order
to tackle such challenges, the existing strategies often rely on a particular
set of algorithms. Here, we propose a robust framework that offers the
provision to incorporate illumination and rotation invariance in the standard
Discriminative Correlation Filter (DCF) formulation. We also supervise the
detection stage of DCF trackers by eliminating false positives in the
convolution response map. Further, we demonstrate the impact of displacement
consistency on CF trackers. The generality and efficiency of the proposed
framework is illustrated by integrating our contributions into two
state-of-the-art CF trackers: SRDCF and ECO. As per the comprehensive
experiments on the VOT2016 dataset, our top trackers show substantial
improvement of 14.7% and 6.41% in robustness, 11.4% and 1.71% in Average
Expected Overlap (AEO) over the baseline SRDCF and ECO, respectively.Comment: Published in ACCV 201
{TADA}: {T}axonomy Adaptive Domain Adaptation
Traditional domain adaptation addresses the task of adapting a model to a novel target domain under limited or no additional supervision. While tackling the input domain gap, the standard domain adaptation settings assume no domain change in the output space. In semantic prediction tasks, different datasets are often labeled according to different semantic taxonomies. In many real-world settings, the target domain task requires a different taxonomy than the one imposed by the source domain. We therefore introduce the more general taxonomy adaptive domain adaptation (TADA) problem, allowing for inconsistent taxonomies between the two domains. We further propose an approach that jointly addresses the image-level and label-level domain adaptation. On the label-level, we employ a bilateral mixed sampling strategy to augment the target domain, and a relabelling method to unify and align the label spaces. We address the image-level domain gap by proposing an uncertainty-rectified contrastive learning method, leading to more domain-invariant and class discriminative features. We extensively evaluate the effectiveness of our framework under different TADA settings: open taxonomy, coarse-to-fine taxonomy, and partially-overlapping taxonomy. Our framework outperforms previous state-of-the-art by a large margin, while capable of adapting to target taxonomies
Hard Occlusions in Visual Object Tracking
Visual object tracking is among the hardest problems in computer vision, as
trackers have to deal with many challenging circumstances such as illumination
changes, fast motion, occlusion, among others. A tracker is assessed to be good
or not based on its performance on the recent tracking datasets, e.g., VOT2019,
and LaSOT. We argue that while the recent datasets contain large sets of
annotated videos that to some extent provide a large bandwidth for training
data, the hard scenarios such as occlusion and in-plane rotation are still
underrepresented. For trackers to be brought closer to the real-world scenarios
and deployed in safety-critical devices, even the rarest hard scenarios must be
properly addressed. In this paper, we particularly focus on hard occlusion
cases and benchmark the performance of recent state-of-the-art trackers (SOTA)
on them. We created a small-scale dataset containing different categories
within hard occlusions, on which the selected trackers are evaluated. Results
show that hard occlusions remain a very challenging problem for SOTA trackers.
Furthermore, it is observed that tracker performance varies wildly between
different categories of hard occlusions, where a top-performing tracker on one
category performs significantly worse on a different category. The varying
nature of tracker performance based on specific categories suggests that the
common tracker rankings using averaged single performance scores are not
adequate to gauge tracker performance in real-world scenarios.Comment: Accepted at ECCV 2020 Workshop RLQ-TO
- …